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Abstract 

Given a time series of graphs G(t) = (V, E{t)), t = 1, 2, • • • , where the fixed vertex set V represents 
"actors" and an edge between vertex u and vertex v at time t (uv <E E(t)) represents the existence of a 
communications event between actors u and v during the t th time period, we wish to detect anomalies 
and/or change points. We consider a collection of graph features, or invariants, and demonstrate that 
adaptive fusion provides superior inferential efficacy compared to naive equal weighting for a certain 
class of anomaly detection problems. Simulation results using a latent process model for time series of 
graphs, as well as illustrative experimental results for a time series of graphs derived from the Enron 
email data, show that a fusion statistic can provide superior inference compared to individual invariants 
alone. These results also demonstrate that an adaptive weighting scheme for fusion of invariants performs 
better than naive equal weighting. 



Index Terms 

Statistical inference on graphs, Time series analysis, Random graphs, Change point detection, Hy- 
pothesis testing, Graph Invariants, Fusion. 



I. Introduction 

Given a time series of graphs G(t) = (V,E(t)), t = 1,2, ■ ■ where the vertex set V = [n] = 
{1, ■ ■ ■ ,n} is fixed throughout and the edge sets E(t) C (^) are time-dependent, we wish to detect 
anomalies and/or change points. Let us consider vertices to represent "actors," and an edge between 
vertex u and vertex v at time t (uv G E(t)) represents the existence of a communications event between 
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Fig. 1. Notional depiction of a time series of graphs in which the entire vertex set V behaves in some null state for t = 
1, • • • ,t* — 1 and then, at time t* , a subset of vertices Va exhibits a change in connectivity behavior. 

actors u and v during the t th time period. Thus E(t) represents the collection of (unordered) pairs 
of vertices which communicate during (t — l,t]. We will not consider directed edges or hyper-graphs 
(hyper-edges consisting of more than two vertices) or multi-graphs (more than one edge between any 
two vertices at any time t) or self-loops (an edge from a vertex to itself) or weighted edges, although all 
of these generalizations of simple graphs may be relevant for specific applications. 

The specific anomaly we will consider is the "chatter" alternative - a small (unspecified) subset of 
vertices with excessive communication amongst themselves during some time period in an otherwise 
stationary setting, as depicted in Figure 1. This figure notionally depicts the entire vertex set V behaving 
in some null state for t = 1, ■ • • , t* — 1; then, at time t*, a collection of vertices Va C V (\Va\ = 
m, 2 < m <C n) exhibit probabilistically higher connectivity. (The remaining Q) — (™) interconnection 
probabilities remain in their null state at time t*.) Our statistical inference task is then to determine 
whether or not there has emerged a "chatter" group at some time t = t* , as shown in Figure 1. 

The latent process model for time series of graphs presented in [1] provides for precisely this temporal 
structure. Each vertex is governed by a continuous time, finite state stochastic process {X v (t)} v ^y, with 
the state-space given by {0, 1, • • • , K}. The probability of edge uv at time t is determined by the inner 
product of the sub-probability vectors specified by J* 1 I{X w (t) = k}dr, k = 1, • • • , K, for w = u, v. 
For the scenario depicted in Figure 1, the vertex processes {X v (t)} v& v A are stationary until time t* — 1 
and then undergo a change point, while the processes {X v (t)} veV \y A remain stationary throughout all 
time. 

In [1], the model produces a dependent time series of graphs G(t), each of which is itself a latent 
position model with conditionally independent edges given {X v (T)} v& v,T<t- The model allows two simpli- 
fying approximations; a second-order (central limit theorem) approximation with temporally independent 
random graphs each of which is itself a random dot product ([2], [3], and Section 16.4 in [4]) latent 
position model [5] , and a first-order (law of large numbers) approximation with temporally independent 
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Fig. 2. The "kidney-egg" random graph model, denoted n(n,p,m,q). The small "egg" represents the m vertices (Va) that 
exhibit chatter (each edge occurring with probability q). The "kidney" is the population of n — m vertices which are not exhibiting 
chatter (each edge occurring with probability p < q). Edges between a vertex in the kidney and a vertex in the egg occur with 
probability p. When m = or q — p, this model degenerates to ER(n,p). 

random graphs each of which is itself an independent edge random graph model [6]. 

The simplicity of the first-order approximation, depicted in Figure 2 for the special case of homogeneity 
vs. kidney-egg, provides a useful framework for description. If the vertex processes {X v (t)} ve y are 
independent and identical, with stationary probability vector tto = [710,0, ^0,1, ■ ■ ■ ,^o,k]', then the first- 
order approximation produces a temporally independent series of homogeneous independent edge Erdos- 
Renyi random graph (denoted by ER(n,p)) with p = (7fo,7fo), where 7T0 = [7ro,i,--- , kq,k\' • The 
vertex processes {X v (t)} ve v A change at time t* — 1, taking on stationary probability vector tta, so that 
G(t*) is a kidney-egg independent edge K(n,p,m,q) random graph with q = (tta^a)- The idea that 
the change point consists of a small collection of vertices exhibiting excessive interconnection probability 
results in the restriction of this model to the case q > p. (Here we have assumed, for simplicity, that the 
geometry provides (tto,tva) = P-) 1 

In [7], the scan statistic graph invariants are introduced and applied to the problem of detecting "chatter" 
anomalies in time series of Enron graphs. In [8], various graph invariants (size, maximum degree, etc.) 

'if (7F0, t^a) — p' > p, then we have E[cfe(/(i>)] = mq+ (n — m) x p for a v € egg, and K[deg(v)] = (n — m) xp + mxp' 
for a v G kidney. The difference between these two expected degrees is then m x (q — p) + (n — m) x (p' —p). If m is of 
order o(n), we see that the above expression is minimized over p' > p when p' = p, which indicates that the most difficult 
scenario is when p' = p. 
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are considered for their power as test statistics in testing Hq : ER(n,p) vs. Ha ■ K(n,p,m,q). It is 
demonstrated that no single invariant is uniformly most powerful. See also [9]. 

In [10] the principal eigenvector of a matrix based on the graph is tracked over time, and an anomaly 
is declared to be present if its direction changes by more than some threshold. Researchers in [11] have 
addressed problems in dynamic network analysis such as detection of anomalies or distinct subgraphs in 
large, noisy background in signal processing fields. Recently, [12] proposed a methodology of detecting 
anomalous graphs by examining distributions of vertex invariants instead of using a single graph invariant. 
They used a simple non-time series of simulated ER random graph models. In [13], a locality statistic 
using a generalized likelihood ratio test statistic (they call this a scan statistic) has been applied for an 
online network intrusion detection. Other notable recent efforts in this direction include [14]— [16]. 

In this paper, we consider the problem of detecting "chatter" anomalies in time series of graphs using 
combinations of invariants. We present experimental results for anomaly detection on time series of 
simulated data from the model in [1], as well as an investigation of a time series of graphs extracted 
from the Enron email corpus, to demonstrate that a statistic which combines multiple invariants can 
provide superior inference compared to individual invariants alone. We further demonstrate an adaptive 
weighting scheme for fusion of invariants that performs better than naive equal weighting. 

Section II presents the graph features (invariants, used as statistics) considered herein, Section III 
introduces our adaptive fusion, and Section IV presents results with simulated data as well as Enron 
email data. We conclude with discussion in Section V. 

II. Graph Features 

We investigate a collection of nine graph features similar to that considered in [8]: size, maximum 
degree, maximum average degree (eigenvalue approximation), scan statistic (scale 1,2,3), number of 
triangles, clustering coefficient, and (negative) average path length. In all cases, a large value of the 
feature F is an evidence in favor of excessive interconnection probability. 

A. Invariants 

1) Size: The size of a graph is the number of edges in the graph, given by 

Fi(G) = size(G) = \E{G)\. 
This is the simplest global graph statistic. 
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2) Maximum Degree: The maximum degree A(G) of a graph is given by 

F 2 (G) = A(G) = maxdeg(v) 
vev 

where deg(v) is the degree of vertex v. This is the simplest localized graph feature. 

3) Maximum Average Degree: The maximum average degree of a graph is the maximum over all 
subgraphs H of G of the average degree of H. If deg{v) is the degree of vertex v, then the average 
degree of a graph G = (V,E) is given by 

d(G) = -L V deg(v) = 2 ^^1 
y ' V ^ yw order (G) 

where order [G] = \V\, the number of vertices. Thus the maximum average degree is given by 

MAD(G) = max d(H) 

H<zG 

where the maximum is over all (induced) subgraphs H of G. 

Since MAD(G) is difficult to compute exactly [17], we resort to an eigenvalue approximation. MAD(G) 
is bounded above by the largest eigenvalue of the adjacency matrix of G, denoted MAD e (G ! ), and we use 

F 3 (G) = MAD e (G). 

As demonstrated in [8], the eigenvalue method appears to be strictly better at detecting increased local 
activity than the greedy approximation method of [17] (Problem 5.7.2, page 90). 

4) Scan Statistic: Scan statistics [7] are graph features based on local neighborhoods of the graph. 
We will consider the scan statistic SS^(G) to be the maximum number of edges over all k th order 
neighborhoods, where the k th order neighborhood of a vertex v, N^v], is the set of vertices whose graph 
shortest path distance from v is less than equal to k. We will consider k = {1,2,3}, where SSfc(G) is 
given by 

F 3+k {G) = SS k (G) = maxsize(0(7V fe [ u ])), 

v&V 

where Q(Nk[v]) denotes the induced subgraph. 

5) Number of Triangles: We consider the total number of triangles in G. If A is the adjacency matrix 
for the graph G, then the number of triangles is given by 

The trace is zero if and only if the graph is triangle-free. 
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6) Clustering Coefficient: We consider the global clustering coefficient (CC) in G, given by 

F 8(G ) = CC( G ) = ^Q, 

where ct is the number of closed triplets (a subgraph with three vertices and three edges) and ot is 
the number of open triplets (a subgraph with three vertices and at least two edges). This measures the 
probability that the adjacent vertices of a vertex are connected. This is sometimes called the transitivity 
of a graph. 

7) Average Path Length: The average path length (APL) is given by 

_ E^K») 

n[n — 1) 

where s(u, v) is the shortest path between vertices u and v. This measures how many steps are required 
to access every other vertex from a given vertex, on average. Unlike our other invariants, a small value 
of the average path length is an evidence in favor of excessive interconnection probability, so we use the 
negated value 

F 9 (G) = -APL(G) 

in this work. (If no path exists between u and v, we use s(u, v) = 2maxs(u', v'), where the maximum 
is taken over all pairs of vertices that have an existing path between them.) 2 

B. Temporal Normalization 

The purpose of our inference is to detect a local (temporal) behavior change in the time series of graphs. 
In particular, we wish to consider as our alternative hypothesis that a small (unspecified) collection of 
vertices (the "egg") increases their within-group activity at some time t* as compared to recent past while 
the majority of vertices (the "kidney") continue with their normal behavior. The null hypothesis, then, 
is a form of temporal homogeneity - no probabilistic behavior changes in terms of graph features. See 
Figure 3. 

As mentioned in [7], the raw features Fi(G(t)) are standardized using a quantity computed from the 
recent past: 

2 In fact, the average path length (APL) is inappropriate for sparse (highly disconnected) graphs. 
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Fig. 3. Ho at t = t* — 1 and Ha at t = t*. The f/o state compares previous many (10 in this case) null graphs to a null 
graph, G(t = tu) and the Ha state compares many null graphs to an alternative graph, G(t = t* = ii2). 

where Jii,e(t) and <7i^(i) are the running mean and standard deviation estimates of Fi based on the most 
recent I time steps; that is, 



Then, a detection at time t is obtained when Si(t) is large. (Note that for the localized statistics (maximum 
degree, maximum average degree, and the scan statistics) we must first perform vertex standardization, 
as in [7] Section 6, so that, for an inhomogeneous collection of stationary null vertex processes, the most 
active vertices do not dominate these statistics.) 

C. Simulation 

Our general algorithm for implementing the time series of random dot product graphs is presented in 
Algorithm II. 1. The only difference among our three models in [1] occurs in line 3, where the proba- 
bility vectors for vertices are obtained; the first approximation uses fixed (non-random or deterministic) 
probability vectors -kq and tta so that (7fo,7fo) = (ttoj^a) = P and (Wa,^a) = Q while the second 
approximation and the exact models use random probability vectors [1]. 

Density estimates of Si(t) for all nine features are presented in Figure 4 (using i = 5). Black 
denotes Hq : Si(t* — 1) and red denotes Ha '■ Si(t*). As we can see from this figure, all features 
have mean zero and variance one (approximately) for Ho. It is our goal to measure the performance 
of each individual graph feature, and then compare these results with the effectiveness of combining 
features, on our statistical inference task. 




t'=t-e 



and 




t'=t~£ 
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Algorithm II.l Time Series of Random Dot Product Graph 
Require: n, ir , ir A , t max 

l: for all time t such that < t < t max do 

2: initialize the n x n adjacency matrix A t with zeros 

3: vp «— calculate probability vectors for all vertices using (ttq, it a) 

4: for all vertex u such that 1 < u < n do 

5: for all vertex v such that 1 < v < n do 

6: if u > v then 

7: e <— (vp u ,vp v ) {vector dot product} 

8: A t [u, v] <— A t [v,u] ^— Bernoulli(e) {draw an edge} 

9: end if 

10: end for 

ll: end for 

12: A[t] A t 

13: end for 

14: return A, time series of graph 



Comparative power results for the individual features are depicted in Figure 5, with a cumulative color 
bar for each feature. For the most subtle case (when q is small, in blue) the power for each feature is 
relatively low, while higher power is achieved as q increases. These results agree qualitatively with the 
results presented in [8]. 

III. Fusion of Graph Features 

We will consider two weighting methods for fusion of our graph features introduced in Section II. Our 
fusion test statistic is given by 

d 

S w (t) = J2m(t)S t (t), 

i=l 

where d is the number of graph features (d = 9, for our investigations). 

A. Weighting 

The naive equal weighting scheme is given by 

Wi(t) = 1/d 
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numtri cc apl 



Fig. 4. Density estimates for M = 10, 000 Monte Carlo replicates of Si(t) in the first approximation model. G(t) = ER(n = 
50, p = 0.01) for t = 1, • ■ • , t* — 1 and G(t*) — k(ti = 50, p = 0.01, m = 6, q = 0.3). For each invariant, black denotes 
Ho : Si(t* - 1) and red denotes if a : Sj(t*). 

for all i, and t. 

Our adaptive weighting scheme uses 

w <(*) = 7I\ ~ Pi(*) > 

where and <Ti(t) are the mean and the standard deviation of Si(t*— 1) over M Monte Carlo replicates. 
(Due to our temporal normalization, all features have mean zero and variance one (approximately) when 
"recent past" consists of stationarity, which is the assumption when testing for change at time t.) A 
detailed algorithm of this approach is shown in Algorithm III. 1 . 

Notice that the adaptive weights are a function of the graph G(t) being tested (line 6 of the algorithm). 
This implies that the features with larger deviations from the norm get higher weights and contribute 
more to the inference. 

B. Examples 

A graphical example is illustrated in Figures 6 and 7. In Figure 6, each point represents a Monte Carlo 
replicate of time series of graph in two-dimensional Euclidean space using the first two features (size and 
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Fig. 5. Statistical power for our nine graph features in the first approximation model. G(t) — ER(n — 50, p = 0.01) for 
t = 1, • • • ,t* - 1 and G(t") = n(n = 50, p = 0.01, m = 6,q), for q e {0.2,0.3,0.4,0.5} and allowable Type I error rate 
a = 0.05, based on M — 10, 000 Monte Carlo replicates. The error bars represent 1.96 x standard error for the sample means. 



maximum degree). The black points (circles) are Ho : Si(t* — 1), and the color points are Ha ■ Si(t*); the 
points above the detection boundaries (critical values in Algorithm III. 1 , line 8) are colored in green ("+" 
symbols) and represent the power of the test. Notice that this boundary is linear for the equal weighting 
while it is not for the adaptive weighting. The former is because the boundary is calculated based on equal 
weighting for all Si(t* — 1) points; the slope of the line is always -1 and the intercept can be calculated 
with a given significance level of the test (i.e., ax+by > c, a = b = 1/d, .'. y > —x+dc, where c = cv). 
For the adaptive weighting case, meanwhile, the color of the Si(t*) points are determined by the distance 
from each point to /xo, the mean vector of Si(t* — 1); the points whose fused values are bigger than the 
critical value will get the green colors. This means that every Si(t*) point gets a different weight and 
therefore the detection boundary is not linear. Figure 7 shows the adaptive weighting case for various 
values of q. As q increases, there are more green points, which implies higher power as expected. 

IV. Fusion Experiments 

A. Simulations 
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Algorithm III.l Hypothesis Test using Adaptive Weighting Fusion 
Require: Si(t) : M x t max x d normalized feature matrix, t* 

1: Si(t* - 1) <- M x d matrix for null at time t* - 1 from and 

Si(t*) <— M x d matrix for alternative at time t* from S;(i) 

2: fio(t) <-l xd mean vector of Si(t* — 1), and 

(7o(i) <- 1 x standard deviation vector of Si(t* — 1) over M Monte Carlo replicated 

3: pwr <— 

4: for all replicate j such that 1 < j < M do 

5: x <— Si(t*)[j,} {single replicate of Si(t*)} 

6: w <— \x — fio(t)\/ao(t) {1 x d weight vector} 

7: S w (t* -l)^J2i WiSi(t* - 1) {1 x M fused null vector} 

8: cv quantile(S ,lu (t* - 1),0.95) {critical value: 95% quantile} 

9: ^(t*) WiXi {fused scalar of x} 

10: if S w (t*) > cv then 
11: pwr <— pwr + 1 
12: end if 

13: end for 

14: return pwr/M, power of the test 



The simulation setup of this experiment is the same as the one in Section II-C except that fusion of 
graph features is applied. The performance of fusion with all nine features is depicted as horizontal lines 
in Figure 8. In all cases, the fusion lines are above the corresponding individual bars, and the adaptive 
weighting fusion lines are above the equal weighting fusion lines. 

Figure 9 depicts power as a function of fusion dimension for the different weighting schemes for the 
three models in [1]. Given a fusion dimension d', all (^,) possible combinations of features are considered 
for the fusion and the best performance is plotted. The difference in performance among the three models 
in [1] is minimal ("qualitatively similar"), while the superiority of the adaptive weighting scheme (with 
A symbol) is apparent. Table I depicts the actual weightings obtained via the adaptive weighting scheme 
for d' = 4. We see that, for the most part, the same features are selected for all three models in [1]. 

In Figure 10 we present a statistical power plot of fusion using all nine features (d' = d = 9) with 



November 1, 2012 



DRAFT 



12 




(a) Equal Weighting (b) Adaptive Weighting 

Fig. 6. Scatter plots for size versus maximum degree for each fusion technique. Each point represents a Monte Carlo replicate. 
The black points (circles) are Si(t* — 1), and the color points are Si(t*); the points above the detection boundaries (critical 
values) are colored in green ("+" symbols). The ratio of the number of green points over the total of green and red points 
represents the power of the test: power = 0.457 for the equal weighting and power = 0.564 for the adaptive weighting. Blue 
lines represent detection boundaries, which provide quantitative rejection regions. 

TABLE I 

The estimated weightings obtained via the adaptive weighting scheme for d' = 4 from Figure 9. We see 

THAT, FOR THE MOST PART, THE SAME FEATURES ARE SELECTED FOR ALL THREE MODELS IN [1]. 



model argmaxi Wi W2 w>3 Wi 



1st approx (1,2,6,7) 
2nd approx (1,2,6,7) 
exact (1,2,6,7) 



2.66 0.86 1.30 0.10 

2.24 3.88 4.62 0.11 

1.25 5.14 6.01 13.9 



q = 0.3 and a = 0.05 as a function of the rate parameter r for the vertex processes 3 . These results 
demonstrate that (1) adaptive weighting is superior to equal weighting, (2) the second approximation is 
more faithful to the exact model than is the first approximation, and (3) both approximations are accurate 

3 The parameter r controls the variability of the latent stochastic processes {X v (t)} for the vertices. In particular, a large 
value of r corresponds to small variability in {X v (t)} (the second-order approximation), and as r — > 00 the processes {X v (t)} 
converge to the stationary probability vectors ttq and tta (the first-order approximation). See [1] for detail. We have used 
r = 1024 for all other results presented herein. 



November 1, 2012 



DRAFT 



13 




(c) q = 0.4 (d) q = 0.5 

Fig. 7. Scatter plots for size vs. maximum degree for adaptive weighting for q = {0.2,0.3,0.4,0.5}. Each point represents a 
Monte Carlo replicate. The black points (circles) are Si(t* — 1), and the color points are Si(t*); the points above the detection 
boundaries (critical values) are colored in green ("+" symbols). The actual powers of the test are 0.332, 0.564, 0.775, and 
0.917, respectively. As q increases, there are more green points ("+" symbols), which implies higher power. Blue lines represent 
detection boundaries, which provide quantitative rejection regions. 

for large r. 
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Fig. 8. Statistical power for our nine graph features and two fusion schemes in the first approximation model. G(t) = ER(n = 
50, p = 0.01) for t = 1, ••• ,t* - 1 and G(t*) = n(n = 50, p = 0.01, m = 6,q), for g G {0.2,0.3,0.4,0.5} and allowable 
Type I error rate a — 0.05, based on M = 10, 000 Monte Carlo replicates. The horizontal lines indicate the power using fusion 
statistics S w (t) with d! — 9. The error bars represent 1.96 x standard error for the sample means. The superiority of adaptive 
weighting (solid lines) over equal weighting (dashed lines) is apparent. 



B. Enron Email Data 

We use the Enron email data used in [7] for this experiment. The nine features, Si(t) for 1 < t < 189, 
are calculated for graphs derived from email messages among n = 184 executives during one week 
periods. Figure 11 depicts histograms of Si(t) for each i. 

Our interest is the "alias" detection identified at week 132 in [7], when an employee changes his/her 
email address. Therefore, we choose t* = 132, the third week of May 2001. Figure 12 depicts scatter 
plots of Si(t) for t = {1, . . . , 132} for various pairs of invariants, where Si(t*) is shown in red. Unlike 
the simulation in Figure 7, Monte Carlo replicates of graph are not available for real data; therefore the 
131 previous graphs (shown as black points in the figure) are used to determine detection boundaries. 
This investigation reveals that the combination of size and maximum degree allows detection based on 
Si(t*) for both weighting schemes (the red point is above both critical lines, in panel a), while only the 
adaptive weighting scheme detects the anomaly for the other three feature pairs depicted (panels b,c,d). 
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Fig. 9. Statistical power plots for fusion statistics for the three models in [1] as a function of fusion dimension when q = 0.3, 
M = 10,000, and a — 0.05. The error bars represent 1.96 x standard error for the sample means. The fusion dimensions 
(d') are chosen from the best possible combinations. The difference in performance among the three models is minimal. The 
adaptive weighting scheme (with A symbol) is superior to equal weighting. 



The performance of equal and adaptive weighting fusion methods with all possible combinations of 
features at t* = 132 are summarized in Table II. For example, when the fusion dimension d! = 2, the 
possible number of combination of feature dimensions is 36, and both equal and adaptive weighting 
methods can detect 24 cases, but only adaptive weighting scheme can detect 5 additional cases. Note that 
there is no case that only equal weighting scheme can detect while adaptive weighting scheme cannot. 

V. Discussion 

We have demonstrated, via simulation results using a latent process model for time series of graphs 
as well as illustrative experimental results for a time series of graphs derived from the Enron email data, 
that an adaptive weighting methodology for fusing information from graph features provides superior 
inferential efficacy for a certain class of anomaly detection problems. 

One notable implication of this work is that inferential performance in the mathematically tractable 
approximation models in [1] does indeed provide guidance for methodological choices applicable to the 
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Fig. 10. Statistical power as a function of rate parameter r for models in [1] and both weighting schemes based on M = 10, 000 
Monte Carlo replicates, with d! — d — 9, q = 0.3, and a — 0.05. The horizontal lines represent results for the first approximation 
(r — > oo) ± three standard deviations for adaptive weighting (upper line, at power approximately 0.56) and equal weighting 
(lower line, at power approximately 0.45). 

TABLE II 

The performance of equal and adaptive weighting fusion methods on Enron email graphs. For example, 
when the fusion dimension d! = 2, the possible number of combination of feature dimensions is 36, and 
both equal and adaptive weighting methods can detect 24 cases, but only adaptive weighting can 

detect 5 additional cases. 
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exact (realistic but intractable) model. Furthermore, to the extent possible, we may tentatively conclude 
that model investigations have some bearing on real data applications. 

An important extension of this work will be to time series of weighted and/or attributed graphs, 
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Fig. 11. Enron email data histograms of Si(t) for 189 weeks. 



where message count and/or content is used to augment edges with (categorical) "topic" attributes [18]- 
[20] where authors demonstrated that using content and context together provides superior inferential 
capability when compared to either alone for a number of inferential tasks. Along with the fusion technique 
introduced in this paper, changes in communication content, in addition to excessive communication 
probability, can aid detection. 
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